Automatic Extraction of Definitions in Portuguese: A Rule-Based Approach

نویسندگان

Rosa Del Gaudio

António Branco

چکیده

In this paper we present a rule-based system for automatic extraction of definitions from Portuguese texts. As input, this system takes text that is previously annotated with morpho-syntactic information, namely on POS and inflection features. It handles three types of definitions, whose connector between definiendum and definiens is the so-called copula verb “to be”, a verb other that one, or punctuation marks. The primary goal of this system is to act as a tool for supporting glossary construction in e-learning management systems. It was tested using a collection of texts that can be taken as learning objects, in three different domains: information society, computer science for non experts, and e-learning. For each one of these domains and for each type of definition typology, evaluation results are presented. On average, we obtain 14% for precision, 86% for recall and 0.33 for F2 score.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Wikipedia to Collect a Corpus for Automatic Definition Extraction: Comparing English and Portuguese Languages

Systems for the detection and extraction of definitions are being developed for different purposes, such as glossaries creation [5, 3], lexical databases [6], ontologies [2], question answering [1], etc. All these systems use annotated corpora to build a set of rules or patterns capable to identify a definition in a different text. The basic structure of a definition should resemble an equation...

متن کامل

DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text

INTRODUCTION The problem addressed in this paper concerns the automatic identification and extraction of medical terms along with their definitions and modifiers from full text consumer-oriented medical articles. The system, DEFINDER (Definition Finder), uses rule-based techniques. The output of our system can be used in several applications: creation and/or enhancement of on-line terminologica...

متن کامل

Development of an Automatic Land Use Extraction System in Urban Areas using VHR Aerial Imagery and GIS Vector Data

Lack of detailed land use (LU) information and efficient data collection methods have made the modeling of urban systems difficult. This study aims to develop a novel hierarchical rule-based LU extraction framework using geographic vector and remotely sensed (RS) data, in order to extract detailed subzonal LU information, residential LU in this study. The LU extraction system is developed to ex...

متن کامل

Discovering grammar rules for Automatic Extraction of Definitions

Automatic extraction of definitions from text documents can be very useful in various scenarios, especially in eLearning systems. In this paper, we propose an approach aimed at assisting the discovery of grammar rules which can be used to identify definitions, using Genetic Algorithms and Genetic Programming. By categorising definitions to enable the learning of more specialised grammars, we en...

متن کامل

Automatic Extraction Of Definitions From German Court Decisions

This paper deals with the use of computational linguistic analysis techniques for information access and ontology learning within the legal domain. We present a rule-based approach for extracting and analysing definitions from parsed text and evaluate it on a corpus of about 6000 German court decisions. The results are applied to improve the quality of a text based ontology learning method on t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Automatic Extraction of Definitions in Portuguese: A Rule-Based Approach

نویسندگان

چکیده

منابع مشابه

Using Wikipedia to Collect a Corpus for Automatic Definition Extraction: Comparing English and Portuguese Languages

DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text

Development of an Automatic Land Use Extraction System in Urban Areas using VHR Aerial Imagery and GIS Vector Data

Discovering grammar rules for Automatic Extraction of Definitions

Automatic Extraction Of Definitions From German Court Decisions

عنوان ژورنال:

اشتراک گذاری